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Abstract — A framework with two scalar parameters is introduced 
for various problems of finding a prefix code minimizing a 
coding penalty function. Tlie frameworli encompasses problems 
previously proposed by Huffman, Campbell, Nath, and Drmota 
and Szpankowski, shedding light on the relationships among these 
problems. In particular, Nath's range of problems can be seen as 
bridging the minimum average redundancy problem of Huffman 
with the minimum maximum pointwise redundancy problem of 
Drmota and Szpankowski. Using this framework, two linear-time 
Huffman-like algorithms are devised for the minimum maximum 
pointwise redundancy problem, the only one in the framework not 
previously solved with a Huffman-like algorithm. Both algorithms 
provide solutions common to this problem and a subrange of 
Nath's problems, the second algorithm being distinguished by its 
ability to find the minimum variance solution among all solutions 
common to the minimum maximum pointwise redundancy and 
Nath problems. Simple redundancy bounds are also presented. 

Index Terms — Huffman algorithm, minimax redundancy, optimal 
prefix code, Renyi entropy, unification. 



I. INTRODUCTION 

A source emits symbols drawn from the alphabet X — 
{1, 2, . . . , n}. Symbol i has probability pi, thus defining prob- 
ability mass function vector p. We assume without loss of 
generality that pi > for every i £ X, and that pi < pj for 
every i > j G X). The source symbols are coded into 
binary codewords. Each codeword Ci corresponding to symbol 
i has length k, thus defining length vector I. 

It is well known that Huffman coding [1] yields a prefix code 
minimizing "^j^^xPik given the natural coding constraints: 
the integer constraint, h £ Z+, and the Kraft (McMillan) 
inequality [2]: 

iex 

Hu, Kleitman, and Tamaki [3] and Parker [4] independently ex- 
amined other cases in which Huffman-like algorithms were op- 
timal; this work was later extended [5], [6]. Other modifications 
of the Huffman coding problem were considered in analytical 
papers [7]-[9], although none of these proposed a Huffman- 
like algorithmic solution. In each paper, relationships between 
the modified problem and the Huffman coding problem were 
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explored. Parker proposed an algorithmically-motivated two- 
function parameterization defining various Huffman coding 
problems; these two parameter functions are a "weight combi- 
nation" function and a "tree cost" function [4]. Three problems, 
first examined in [1], [7], [8], were considered as a part of this 
framework; here we show that a fourth [9] fits into it as well. 
In addition, we find a simpler redundancy-motivated unifying 
problem class that relates the four problems, one involving two 
scalar parameters rather than two functional parameters. This 
new framework reveals a united analytical structure, including 
simple redundancy bounds and novel algorithmic results which 
improve upon the algorithm of [9]. 

In Section background is given on the coding problem 
introduced in [7]. In Section Hill the new framework, based 
on an extension of this problem, is introduced. The problem 
and the three other aforementioned problems are then put into 
the context of this framework. In Section Hvl the framework 
is used to help find linear-time algorithms for the problem 
in [9]. Redundancy bounds are presented in Section |V] with 
concluding thoughts following in Section IVll 

II. Background: Exponential Huffman coding 

One particular application of a modified coding problem was 
found by Humblet [10] for a problem involving minimization 
of buffer overflow in communications. In this application, the 
function minimized is X^iga'P'^'"* for a given (3 > 0. This 
is easily generalized to negative (3 by specifying minimization 
of the /3-exponential average 

F^(p,0 = ilog2 5]p.2'3''. (1) 

This problem was originally proposed by Campbell [7] and a 
linear-time algorithm found independently by Hu et al. in [3, 
p. 254], Parker in [4, p. 485], and Humblet in [11, p. 25] (later 
published as [10, p. 231]). This algorithm covers all of R; the 
case of /3 = is considered by noting that /3 — > yields the 
original Huffman coding problem. 

Below is the procedure for the exponential extension of Huff- 
man coding with parameter /?. Note that it minimizes 
over I, even if the "probabilities" do not add to 1. We refer 
to such arbitrary positive inputs as weights, often denoted by 
w = {wi} instead of p = {pi}: 

Procedure for Exponential Huffman Coding 

1) Each item rrii G {mi, 7712, . . . , m„} has weight Wi € 
yVx, where Wx is the set of all such weights. (Initially, 
nii = i.) Assume each item nii has codeword Ci, to be 
determined later. 

2) Combine the items with the two smallest weights Wj 
and Wk into one item rhj with the combined weight 
Wj — 2^{wj + Wk)- This item has codeword Cj, to 
be determined later, while rrij is assigned codeword 
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Cj = CjO and mt codeword Ck = Cjl. Since tliese have 
been assigned in terms of Cj, replace Wj and Wk with 
Wj in W to form W^. 
3) Repeat procedure, now with the remaining n — 1 code- 
words and corresponding weights in W, until only one 
item is left. The weight of this item is X^igAf AH 
codewords are now defined by assigning the null string 
to this trivial item. 

This algorithm can be modified to run in linear time (to input 
size) given sorted weights in the same manner as Huffman 
coding [12]. An example of exponential Huffman coding for 
/3 — logjl.l is shown in Figure \\\ The resulting code is 
different from that which would be obtained via Huffman 
coding (/3 = 0). 

The output of a Huffman-like algorithm might be a code (and 
thus the implicit code tree, e.g., [13]) or merely codeword 
lengths; we assume the latter from here on, because valid 
codewords can be inferred from the lengths. Thus we can view 
the problem such an algorithm solves as an integer optimization 
problem. This is useful because many different codes can 
correspond to the same set of codeword lengths and thus all 
be optimal for a given problem. 

Considering the codeword lengths alone as the solution to a 
given problem, we find that some problems have a unique 
optimizing set of lengths, while others have more than one 
distinct optimal solution. Multiple different solutions manifest 
themselves in the algorithm as possible ties in the weight 
of (possibly combined) items in the combination step (step 
2 above). Thus the algorithm, as with Huffman coding, is 
nondeterministic. Two deterministic variants are bottom-merge 
Huffman coding and top-merge Huffman coding [13]. Code 
trees yielded from the former method have been called, depend- 
ing on the properties focused upon, best Huffman trees [14], 
compact Huffman trees [15], minimal Huffman trees [16], 
and minimum variance Huffman trees [17], the last of these 
because variance is minimized among (tied) optimal code trees 
(codeword lengths). 

Given 6 G E and p, if we relax the integer constraint on Z, 
minimizing Fb{jp,l) becomes a simple numerical optimization 
and provides a lower bound for the integer-valued problem. 
(We use b instead of (3 from here on to refer to the parameter for 
the real- valued problem.) Campbell [7] noted that the optimal 
value of Ft{p, I) for fe G (^1, +°o)\{0} is the Renyi entropy 
of order a — {1 + b)~^: 

This should not be surprising given the relationship between 
Huffman coding and Shannon entropy, which corresponds to 
6^0, Hi{p) [18]. 

Given b G ( — 1,+cxj) and p, the optimal ideal real- valued 



lengths achieving j2j are given by 

ll = - Y7Tlog2P« + log2 Y.P^^- (3) 

At the extremes of the (—1, +cxd) range, solutions are defined 
as the limit of the solutions for b I —1 and b t +oo, 
respectively. For 6 < — 1, there is no real-valued solution, the 
problem being optimized hy l\ = and = +cx3 for every 
i > 1. 



HI. Minimization of d-AVERAOE b-REDUNDANCY 

We call the difference between an integer k and the optimal 
real-valued solution the pointwise b-redundancy 

rb{i) = li- l], 

to emphasize its dependence on b. The arithmetic average 
of pointwise 0-redundancy was the problem considered by 
Huffman in his original paper, "A Method for the Construc- 
tion of Minimum-Redundancy Codes." Here we introduce a 
generalization of this problem encompassing several cases of 
interest. 

Suppose we wish to minimize d-average b-redundancy or 
DABR, 

RbAP,l) = liog,Y,pa''^^^^\ (4) 

iex 

This amounts to finding ll^dip) such that 

RbAP,K,d{P)) = rmniRb.d{p,l) 

= minaiog2E.e;,P«2^(''-'') 

1 + b + d 

(5) 

where I is restricted to the integers and by the Kraft inequality 
(implicit from here on). 

This reduces to an exponential Huffman coding problem. Then, 
given sorted {pt}, |5} is solvable in linear time; note that 
the normalization of the terms is optional for the algorithm. 
For d < —1, the solution is always the unary code I = 
(1, 2, ... , n— 1, n — 1). Considering the edges via limiting 
(as we did with real-valued solutions), the range of nontrivial 
cases for minimal DABR codes for a given probability mass 
function can thus be considered to be parameterized by 6 x d G 
[— l,+c>o] X [— 1,+cxd], as in Figure|2| 

As indicated in this figure, many interesting coding problems 
fit within this framework. These problems, which we discuss 
below, correspond to subsets of this two-dimensional extended 
quadrant ([— l,-|-oo] x [— l,+oo]). On the set of points for 
which b is +oo, for example, the minimization reduces to 
exponential Huffman coding with parameter f3 = d. For d = 
(b G ( — 1, +oo]) we have Huffman coding. A particular type 



ffiEE TRANSACTIONS ON INFORMATION THEORY 3 

Codeword 



length Codeword Item Weight 



2 00 mi 0.36 y 0.374 0.726 — ^ 1.21 

2 01 mi 030 --y^^y^ 0.36 ^y^^ 0374-^ 

2 10 ma 0.20 / 0.30 ^ 

2 11 m.i 0.14^ 



Fig. 1. Exponential Huffman coding for weights w = (0.36,0.30,0.20,0.14) and /3 = log2l.l- In the first step, the two smallest items 
(with weights 0,13 = 0.20 and ^04 = 0.14) are combined into a compound item with weight 0.374 = 1.1 ■ (0.20 + 0.14); thus C3 and C4 end 
in and 1, respectively. At each additional step, the two smallest remaining items are combined in a similar fashion. In this manner a code to 
optimize (and as a by-product calculate) the value of J^i^x ''"i^^^' ^^^^ '^"■^ bottom up. In this case, the minimized value is 1.21. 



of Huffman coding occurs for 6 = +00, d J, 0. In such a case, 
we note that 



RbAP,l) = Y.P'''i + t'^Ul) + 0{(f) asd^O (6) 



where the second term on the right-hand side represents 
variance. This being the tie-breaking term, we have bottom- 
merge Huffman coding. 



IV. MINIMIZATION OF MAXIMAL POINTWISE REDUNDANCY 

As average pointwise (O-)redundancy has been well understood 
for some time, Drmota and Szpankowski decided to explore 
the previously overlooked minimization of maximal pointwise 
redundancy [9], [19]. 

We define dth exponential redundancy as DABR for 6 = 0. 
Note that the maximal redundancy problem is equivalent to 
minimizing dth exponential redundancy as d +00. Thus, 
considering d £ [0, +00], dth exponential redundancy is 
a subproblem with a parameter that varies solution values 
between minimizing average redundancy (Huffman coding) 
and minimizing maximal redundancy; such a range of problems 
and solutions was sought in [19]. This was previously derived 
axiomatically without regard to such a range and without 
solution [8]. The version of the minimal DABR coding solution 
applying to the maximal redundancy subproblem was found 
shortly thereafter [4], although it was not generalized to 6 ^ 
or to d = -hoo. 

Drmota and Szpankowski presented a simple method for find- 
ing a code with minimum maximal redundancy [9], [19]. How- 
ever, this solution is deficient in the following senses: First, 
time complexity is 0(n log n). Second, the Kraft inequality 
is not necessarily satisfied with equality, meaning that the 
optimal code found in this manner is often, in some sense, 
wasteful. Third, the code does not necessarily optimize dth 
exponential redundancy for any d < +00. The method is also 
not generahzed to maximal 6-redundancy (6 0). 



In order to overcome the first two deficiencies, we propose a 
reduction to a previously-known algorithm with linear com- 
plexity previously discussed by Parker [4]. This problem was 
termed the tree-height measure problem, though it was not pre- 
viously considered in the context of the maximal redundancy 
or DABR problems. 

The tree-height measure problem minimizes the maximum 
value of + c • given c > and weight vector w. Instead 
of using Wj = 2'^{wj + Wk) on the merge step of Huffman 
coding, the Huffman-like tree-height measure algorithm uses 
Wj = c+ma,x{wj,Wk)- In order to use the tree-height measure 
algorithm, assign weights according to 

1+0 Pn 

which is always nonnegative, and let c = 1. Then this modified 
Huffman algorithm minimizes 



max(wi(&) + c • 



1+0 Pn 



max. r bit) + logj Pj^ 

i ^ — ^ 

-log2P^. 



Thus this linear-time algorithm returns a length vector minimiz- 
ing maximum 6-redundancy and satisfying the Kraft inequahty 

with equality. 

Because ties can occur in selecting weights to combine, the ex- 
ponential Huffman algorithm might yield one of many possible 
optimal codes, including codes not optimal for the limit of dth 
exponential redundancy (as d +00). For example, consider 
p = (^i^,^,^,^)- For dth exponential redundancy, 
I = (1,2,3,4,4) and I = (1,3,3,3,3) are both optimal for 
d —^ +00. These not only minimize maximal redundancy, 
but, among codes that optimize this, these codes also have the 
lowest probability of achieving this maximal redundancy, as 
this is related to the second term of the expansion of Rb,d{p, I) 
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(standard) 
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Fig. 2. The parameter space for minimal d-average 6-redundancy 
(DABR) coding witli the following noted subproblems: the Huffman 
problem (the above line at d = 0), Campbell's exponential coding 
problem (line at fe = +oo), the problem solved via Schwartz's 
bottom-merge Huffman coding method (limit point at (+oo, 0) when 
approached from above), Nath's dth exponential redundancy problem 
(line at b = 0), Drmota and Szpankowski's maximal redundancy (point 
at (0, +oo)), and maximal 6-redundancy (line aX d = +oo). Each 
point in the extended quadrant represents a different (parameterized) 
problem, as in Figure [ 



for d +(X>: 
Rb,diP,l) = 



maxi rt{i) 

+ I'^og^Px [rbiX) = maxj rt{j)] 



(7) 



Each term in the expansion has a different asymptotic com- 
plexity. As with minimum variance (bottom-merge) Huffman 
coding jSJ, each additional term further restricts the set of 
feasible codes to those that minimize the current term given the 
optimization of previous terms. In the above example, all terms 
are minimized by both the aforementioned sets of lengths. In 
contrast, I = (2, 2, 2, 3, 3), although also minimizing maximal 
redundancy, results in a code where codewords have a higher 
probability of achieving maximal redundancy. This solution, 
which is in some sense inferior, can nevertheless be achieved 
by the tree-height measure algorithm, specifically the bottom- 
merge version. 

It is possible to find a D G R such that, for every d > D, dth 
exponential Huffman coding minimizes maximal redundancy. 
Let min^j 7i,j denote the minimum strictly positive value of 
7ij, and let (x) denote the fractional part of x, i.e., (x) = 
X — \_x\. Assign S = min^ Al] — It is possible to show 



that a sufficiently large D is given by D = jlogj^ > 1 
[20, pp. 59-62]. However, finding D requires sorting, so an 
algorithm derived from this D would not be a linear-time 
algorithm. 

Fortunately, it is possible to arrive at a linear-time algebraic 
Huffman algorithm, that is, one that keeps D as a variable. 
Algebraic Huffman algorithms were introduced by Knuth [5]. 
The one proposed here uses a Huffman algorithm which keeps 
track of both the first- and second-order terms; ties between 
these pairs of terms can occur only when all terms are tied, 
this due to the manner in which the Huffman procedure 
works. Before explaining why this is the case, we present the 
algorithm. 

The aforementioned first- and second-order terms are 
w[ = lim 



and 



Wi = lim [■Wi{b,d)] ■ \Wi 

d — v + oo 



1 + b+d 
1 + 6 



respectively, where leaf nodes have 

Wi{b,d)^p^ 
as in d-average b-redundancy. 

One can think of w'^ as representing an invertible function of 
maximal 6-redundancy, 



■2" 



where, at any given point of the algorithm, rb{i) = U ~ l\ 
uses the depth of item i in its interim code tree as the value 
U. Note that only rb{i) is variable; the denominator term of 
w'i is a result of not normalizing the weights at the start of the 
algorithm. In a similar manner, represents the probability 
of maximal 6-redundancy Px [rh[X) — maxj ri,{j)]. 

1 

To implement this algorithm, we let w'i = pl^'' and ui" = pi 
for the initial case. In comparing items j and k, we consider 
them as lexicographically ordered pairs — e.g., Wj — {wj , w") 
— so that Wj > Wk if and only if either w'j > uij. or if 
w'j = Wfc and w" > w'^, as in [5]. In combining items j 
and k (where Wj > Wk as described), the new item will have 



2^. 



If ^ 



2 ■ max{w'j , w'^ 

w'J + w'^. That is, 

(2mj,m") 

(2^,™;' + o 



> Wi., then w' 



if w'j > w'k 
otherwise. 



The reasons for this are easily seen if we view Wi as the repre- 
sentation of maximal redundancy and probability this maximal 
redundancy is achieved. Take the maximum and add 1 for the 
additional bit of the codeword (multiplying w[ by 2). Then, 
if the redundancies are identical, add their probabilities (ui"). 
Otherwise, take the probability of the maximal redundancy. 
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Fig. 3. Algebraic maximal redundancy coding, p -- 



, 4, 3, 2, 2) (bottom-merge) 



This combining method is a Huffman algebra, satisfying the 
properties introduced in [5]. The Huffman combining criterion 
is shown by example in Figure |3| The remaining weight pair 



after coding, 



indicates a maximal redundancy of 



logj II and a probability of that this redundancy is achieved. 

We now show that ties in the w pairs imply ties in all terms 
of the expansion presented in 0, or, equivalently, for dth 
exponential redundancy for all d G [D, +oo) where D is some 
(unspecified) constant. 

Theorem 1: If there is a tie in the above w pairs, there is a tie 
in all terms of the corresponding d expansion. 

Proof: Consider two tied pairs. Note that, in each. 



> w" 1+" 



(8) 



because this holds with equality in leaf nodes and the inequality 
is preserved in the merge step, since 2 ■ max(Q., b) > a + b > 
max(a, 6) for a,b > 0. If inequality js} holds without equality 
for the tied pairs, neither node on the corresponding code tree 
can be a leaf node, and, due to ordering for the combination 
step, their four children must be identically weighed. However, 
this fact can be invoked inductively for either pair of children, 
also tied, and thus such a tree could not be finite. Therefore, 
tied pairs arise only in cases for which the inequality holds 
with equality. Thus, they must be leaf nodes or nodes with 
two identically-weighted children. Inductively, this means the 
subtrees must be composed of leaf nodes that are relatively 
dyadic, that is, are dyadic when multiplied by a nontrivial 
common constant. Thus they are equal in all terms, which is 
what we set out to show. ■ 

One can use bottom-merge or top-merge coding so that the 
algorithm is deterministic. If one uses top-merge coding — that 
is, favoring combined items over single items with identical 
weight [13] — one actually need not keep track of the second 
term; the top-merge algorithm behaves identically without 
considering this term. This variant, illustrated in Figure |4| is 
actually a special case of the tree-height measure problem men- 
tioned above. However, if we wish to assure that the solution 
has minimum variance, the algebraic method is needed. 



V. Bounds 



One can easily see that if we relax the integer constraint on 
length for minimizing d-average &-redundancy, the real- valued 
solution is not l\ but some different l^. By substituting the 
solution in jsj, we find 



where uj 



l + b+d 



= 1 



(l+b){l+d) " 

Note that when the values of 6 and d are exchanged, the 
ideal solution remains the same. This problem thus has a high 
degree of symmetry. However, because the problem itself is not 
symmetric, the symmetry of integer solutions is not perfect, as 
we can see in Figure |5| 

Using this and the Shannon code [18] analogue we can 
find bounds for the optimal DABR when 6>— l,d>— 1, 
and & + d > -1: 

< i?b,d(p, iLi(p)) - ab{H^(p) - Hc^ip)) < 1 

where we recall u = and a = j^, the subscript 

of Renyi entropy in j2j- ^^s with exponential Huffman coding, 
equality holds iff the ideal solution has all integer lengths. 
For b — +0O and d — 0, this results in the well-known 
Shannon bounds. For fe = 0, it reduces to a normalized version 
of an inequality in [8]. With a different normalization, this 
inequality relates to Renyi's gain of information of order q, a 
generalization of relative entropy [21]. This is not surprising 
given the relationship between relative entropy and Huffman 
coding noted by Longo and Galasso [22]. 

Due to the reduction to exponential Huffman coding, more 
sophisticated redundancy results may be applied if desired. 
The bounds given by Blumer and McEliece [23] apply to the 
exponential case but appear as solutions to related problems 
rather than in closed form. Taneja [24] gave closed-form 
bounds using an alternative definition of redundancy. 
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Fig. 4. Top-merge maximal redundancy coding, p = • (8, 4, 3, 2, 2) (single variable) 



(1) 




(2) 



r 



(1) 



Fig. 5. The parameter space for minimal DABR coding for p = 
(0.58, 0.12, 0.11, 0.1, 0.09). Each region represents a set of problems 
with the same solution. On the transition curves (soUd), multiple 
solutions are optimal. The five distinct solution regions are {}) I = 
(1,2,3,4,4), (2) I = (1,3,3,3,3), (3) I = (2,2,2,3,3), (4) I = 
(4, 4, 3, 2, 1), and (5) I = (3, 3, 2, 2, 2). The dotted Unes within the 
parameter space indicate the (+oo) asymptotic behavior of the limits 
between regions. Note that b + d+1 = divides nonincreasing length 
vectors from nondecreasing. Also note the high degree of symmetry; 
the imperfection of this symmetry is best illustrated by the different 
asymptotes. 



VI. Conclusion 

A two-dimensional framework is demonstrated to encompass 
examples considered by Parker [4] — classical Huffman cod- 
ing [1], the exponential variant proposed by Campbell [7], and 
the dth exponential redundancy problem proposed by Nath [8]. 
These examples, along with all problems within the framework, 
are solvable by Huffman-like algorithms. The maximal redun- 
dancy problem proposed by Drmota and Szpankowski [9], [19] 
is shown to be optimized by its equivalence to another example 



considered by Parker; the top-merge version of the algorithm 
in particular additionally optimizes dth exponential redundancy 
for large d. A better solution — one minimizing codeword 
length variance among such optimal codes — is suggested 
by and is developed from the two-dimensional framework 
introduced here. All algorithms discussed are Huffman-like 
and thus linear-time given sorted input, unlike the original 
algorithm proposed for maximal redundancy. 

It is unclear whether all nontrivial problems within Parker's 
more general framework are covered by this seemingly more 
specific framework and trivial extensions thereof. Such analy- 
sis, building upon Parker's work, could be a basis for further 
research. Extending this algorithm to alphabetic codes (alpha- 
betic search trees) could also be explored. For nonnegative 
exponents (d > 0), this framework is a trivial extension of [3], 
but negative exponents might provide more of a challenge. 
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